Determining quality of tier assignments

ABSTRACT

Technologies pertaining to computing a tiering policy that defines how digital items are desirable stored across a plurality of different storage tiers are described herein. A data repository that comprises data that is indicative of historic user interaction with a search engine is accessed. Subsequently, a tiering policy for digital items that are retrievable by way of the search engine is computed based at least in part upon the data that is indicative of the historic user interaction with the search engine. Retrieval times for digital items in the data storage tiers differ across the data storage tiers.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.13/210,797, filed on Aug. 16, 2011, which is a continuation of U.S.patent application Ser. No. 11/964,729, filed on Dec. 27, 2007. Theentireties of these applications are incorporated herein by reference.

BACKGROUND

Search engines have enabled users to quickly access information over theInternet. Specifically, a user can submit a query to a search engine andperuse ranked results returned by the search engine. For example, a usercan provide a search engine with the query “Spider” and be provided withweb pages relating to various arachnids, web pages relating toautomobiles, web pages relating to films, web pages related to webcrawlers, and other web pages. Search engines may also be used to returnimages, academic papers, videos, and other information to an issuer of aquery.

Operation of a search engine may include employment of web crawlers tolocate and store a large amount of information (e.g., web pages) that isavailable on the World Wide Web. For example, web pages or informationpertaining thereto may be stored in a search engine index, which is used(in connection with one or more search algorithms) when queries arereceived.

Conventionally, a search engine index is stored in several tiers,wherein different tiers provide different levels of performance. Thetiering of the search engine index is analogous to the memory hierarchyused in computer architecture: overall storage capacity of the index isdivided between different levels that vary in size, speed, latency, andcost. Higher tiers of the index typically have higher speed but havesmaller capacity and higher cost. Accordingly, it is desirable tocarefully index web pages to maximize efficiency of the search engine.

One manner for tiering web pages that has been used is to select a tierof an index in which to place a web page as a function of the web page'srelative importance as determined by some metric, such as a static rankof the web page. Specifically, a number of links to a web page may beused to select a tier of an index in which to locate the web page. Therelative importance of the page, however, is not necessarily indicativeof whether the page is frequently accessed, and thus may be suboptimalfor indexing web pages in a search engine index. Evaluating tierassignment is a difficult problem, however, because it is unclear whichmetrics capture the quality of a particular allocation of web pages tothe tiers.

SUMMARY

The following is a brief summary of subject matter that is described ingreater detail herein. This summary is not intended to be limiting as tothe scope of the claims.

Various technologies relating to tiering digital items (such as webpages) are described herein. User interaction with a search engine,database management system, or the like can be monitored and data can becollected relating to such user interaction. For example, queriessubmitted by users, search results (e.g., digital items) provided inresponse to the queries, and user actions with respect to the searchresults can be monitored and retained. In a particular example, atoolbar on a browser can be used to collect the user history data. Basedat least in part upon the user history data, an indication of quality ofa tier assignment for searchable digital items can be generated, whereina tier assignment indicates to which of several tiers searchable digitalitems are assigned. The indication of quality of the tier may be a valuethat accords to a defined tier assignment quality metric, which isdescribed in detail herein.

In an example, the indication of quality may be determined byascertaining several parameters. For instance, the indication of qualityof the tier assignment may be based at least in part upon weights thatare assigned to observed queries. In an example, the weights may beindicative of relative importance of the queries, and may be based atleast in part upon frequency of issuance of the queries. In anotherexample, the indication of quality of the tier assignment may be basedat least in part upon a probability that, for a particular query and adetermined system load (e.g., how busy a system is when the query isreceived), retrieval of digital items will end in a specified tier. Theprobability may be determined for multiple tiers. In yet anotherexample, the indication of quality of the tier assignment may be basedat least in part upon a measure of search result quality obtained whenretrieval ends in a particular tier. Normalized Discounted CumulativeGain, Mean Average Precision, Q-measure, or other suitable mechanismsfor measuring information retrieval loss or search result quality may beused in connection with determining the measure of tiering quality.

In addition, an improved tier assignment can be generated based at leastin part upon the indication of quality of tier assignment and/or theuser history data. For example, the indication of quality of tierassignment may conform to a defined tier assignment quality metric, andan improved tier assignment may be optimized or substantially optimizedwith respect to the metric. Furthermore, a tiering policy can be updatedbased at least in part upon the improved tier assignment. A tieringpolicy is a policy that is used to assign digital items to tiers, andcan take into account various features that correspond to a digitalitem, such as a number of times the digital item has been accessed by auser, size of the digital item, and the like. The tiering policy can beupdated through the use of machine learning techniques, for example.

Other aspects of the present application will be appreciated uponreading and understanding the attached figures and description.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a functional block diagram of an example system thatfacilitates determining an indication of quality of a tier assignment.

FIG. 2 is a functional block diagram of an example component thatgenerates an indication of quality of a tier assignment.

FIG. 3 is a functional block diagram of an example system thatfacilitates generating an improved tier assignment.

FIG. 4 is a functional block diagram of an example system thatfacilitates generating an improved tier assignment.

FIG. 5 is a flow diagram that illustrates an example methodology forgenerating an indication of quality of a tier assignment.

FIG. 6 is a flow diagram that illustrates an example methodology forgenerating an indication of quality of a tier assignment.

FIG. 7 is a flow diagram that illustrates an example methodology foroutputting a tier assignment that is optimized or substantiallyoptimized with respect to a tier assignment quality metric.

FIG. 8 is a flow diagram that illustrates an example methodology forupdating a tiering policy.

FIG. 9 is an example computing system.

DETAILED DESCRIPTION

Various technologies pertaining to determining quality of a tierassignment, generating an improved tier assignment, and automaticallyupdating a tiering policy will now be described with reference to thedrawings, where like reference numerals represent like elementsthroughout. In addition, several functional block diagrams of examplesystems are illustrated and described herein for purposes ofexplanation; however, it is to be understood that functionality that isdescribed as being carried out by certain system components may beperformed by multiple components. Similarly, for instance, a singlecomponent may be configured to perform functionality that is describedas being carried out by multiple components.

With reference to FIG. 1, an example system 100 that facilitatesoutputting an indication of quality of a tier assignment with respect toa tiered storage system (not shown) is illustrated. Pursuant to anexample, a tiered storage system may be a search engine index withmultiple tiers, wherein a first (highest) tier may be more costly andhave a relatively small amount of storage space, but retrieval time fordigital items retrieved from the first tier may be less than retrievaltimes for digital items retrieved from other tiers (lower tiers). Asecond tier may be less expensive and have more storage space than thefirst tier, but retrieval time may be greater when compared to retrievaltime corresponding to the first tier. In another example, the tieredstorage system may include tiers of storage used in connection with adatabase management system. For example, a server used in a databasemanagement system may have a hard drive, random access memory, andhigh-speed random access memory, which can each be a tier.

The system 100 includes a data store 102 that comprises user historydata 104. The user history data 104 may include, for example, queriesissued by users, search results provided to the users in response to thequeries, search results selected by users in response to being providedwith the search results, and/or other suitable information. In anexample, the user history data 104 can be accumulated by monitoring userinteraction with respect to a search engine. For instance, a toolbarplugin may be installed in a browser, and queries entered into thebrowser may be collected by the toolbar plugin, as well as searchresults returned in response to the queries, user selection ofparticular search results, and the sequence of pages viewed by the userafter submitting the query.

A receiver component 106 receives a subset of the user history data 104.A quality indicator component 108 is in communication with the receivercomponent 106 and receives the subset of user history data 104 from thereceiver component 106. The quality indicator component 108 can generatean indication 110 of quality of a tier assignment, wherein the tierassignment indicates where digital items are to be assigned in a tieredstorage system. For instance, the indication of quality may conform to atier assignment quality metric, which is described in detail below. Inaddition, operation of the quality indicator component 108 is describedin greater detail below.

Now referring to FIG. 2, an example of the quality indicator component108 is illustrated. The quality indicator component 108 includes aweight determiner component 202, a load determiner component 204, a tierdeterminer component 206, and a utility determiner component 208. Theweight determiner component 202 determines a weight that is assigned toeach query used by the quality indicator component 108 to generate anindication of quality of a tier assignment corresponding to a tieredstorage system. In an example, the weight determined by the weightdeterminer component 202 may be based at least in part upon frequency ofissuance of the query (as ascertained from query logs, for example).

The load determiner component 204 determines the system load observedwhen a particular query was executed by a search component (e.g., searchengine, database system, . . . ). The system load may be based at leastin part upon a number of queries processed by the search component whilethe particular query was processed, a number of processing cyclesdedicated to retrieving search results while the particular query wasexecuted, or how “busy” the search component was in general.

The tier determiner component 206 can determine a probability that acertain tier will be the last tier searched over for digital items (withrespect to the particular query) under the system load determined by theload determiner component 204. Generally, when a query is entered into asearch component (e.g., a search engine), retrieval is first performedin higher tiers that are typically smaller but have faster access andretrieval times when compared to lower tiers. Depending on the numberand quality of results obtained in the higher tiers as well as a currentsystem load, retrieval may or may not be performed in lower tiers.Accordingly, as noted above, the tier determiner component 206 candetermine a probability that a certain tier will be the last tiersearched over for digital items (with respect to the particular queryand under the determined system load). The probability can be determinedfor each tier in a tiered storage system.

The utility determiner component 208 determines an indication of searchresult quality (with respect to a particular query) when retrieval endsin a certain tier, wherein the indication of search result quality canbe computed using any suitable metric. In an example, NormalizedDiscounted Cumulative Gain (NDCG) can be used to determine theindication of search result quality. In another example, Mean AveragePrecision (MAP) can be used to determine the indication of search resultquality. In yet another example, Q-measure can be used to determine theindication of search result quality. Accordingly, it can be discernedthat the utility determiner component 208 can utilize any suitablemechanisms/metrics to determine an indication of search result qualitywith respect to the particular query when retrieval ends in the certaintier.

The weight determined by the weight determiner component 202, the systemload determined by the load determiner component 204, the probabilitydetermined by the tier determiner component 206, and the indication ofsearch result quality determined by the utility determiner component 208may be used by the quality indicator component 108 to determine anindication of quality of a tier assignment.

Pursuant to an example, the following algorithm can be used to define ametric of tier assignment quality, and can be employed by the qualityindicator component 108 to determine an indication of quality of a tierassignment:

$\begin{matrix}{{{{TQ}\left( {{T(D)},L} \right)} = {\sum\limits_{q \in Q}\; {{w(q)}{\sum\limits_{t = {1\rightarrow k}}\; {{P\left( {\left. t \middle| q \right.,{T(D)},L} \right)} \times {{Utility}\left( {t,q,{T(D)}} \right)}}}}}},} & (1)\end{matrix}$

where D={d₁, . . . , d_(|D|)} is the set of all digital items (d_(i))that are to be stored in k tiers T₁, . . . T_(k) that have correspondingcapacities |T₁|, . . . , |T_(k)|; t(d_(i)) is the tier assignment foreach item in the set of digital items D, where t(d_(i)) can have valuesl, . . . , k; T(D)={t(d_(i)), . . . , t(d_(i))} is the overall set oftier assignments; TQ(T(D),L) is a measure of tier assignment quality fora current system load L; Q is a set of all possible queries; w(q) is aweight (e.g., relative importance) of a query q; P(t|q,T(D),L) is theprobability that the t-th tier will be the lowest tier visited duringretrieval under the current system load L; and Utility(t,q,T(D)) is ameasure of search result quality obtained when retrieval ends in thet-th tier. Algorithm (1) thus computes an expectation of overall tierassignment quality over all possible queries for the given tierassignment over the probability distribution of ending retrieval in eachtier.

It can be discerned that the number of all possible queries, however, isinfinite. Accordingly, a set of observed queries Q′ may be used by thequality indicator component 108 as an approximation of the distributionof all possible queries. In an example, these observed queries Q′ can berandomly selected from a data repository that includes multiple observedqueries (e.g., the user history data 104), where the probability ofselecting any query q∈Q′ can be computed as the likelihood of selectinga random query received by a search component (e.g., search engine,database management system, . . . ). In another example, the set ofobserved queries Q′ may be selected such that they are representative ofall possible queries. For instance, the queries Q′ may be selected suchthat a number of queries that have a certain length (as measured inwords, characters, or the like) do not exceed a threshold. In addition,queries that are directed at different subject matter can be selected.In yet another example, the queries Q′ may be selected based upon anamount of user data that is associated with such queries. For instance,the queries Q′ may be limited to queries that have sequential user dataassociated therewith, such as user clicks on one or more search resultsand/or advertisements that are provided in response to the queries. Itis to be understood that any suitable manner for selecting a subset ofobserved queries is contemplated and intended to fall under the scope ofthe hereto-appended claims.

For every selected query q in Q′, a relevant result set R(q)={d_(z,l), .. . , d_(q,M)} can be constructed by the quality indicator component 108that includes no more than M items, wherein the items may be partiallyordered from most relevant to least relevant. In an example, the resultset may incorporate digital items that are frequently selected/visitedby users following submission of the query to a search component, wherefrequency of selection/visitation can be combined with the time thatusers spent viewing the digital items; and/or digital items returned bya search component as relevant results for the query across all tiers ofa tiered storage system.

Using the queries Q′ and corresponding result sets, the followingalgorithm can be used to define a metric of tier assignment quality, andcan be employed by the quality indicator component 108 to determine anindication of quality of a tier assignment:

$\begin{matrix}{{{{TQ}\left( {{T(D)},L,Q^{\prime}} \right)} = {\sum\limits_{q \in Q^{\prime}}\; {{w(q)}{\sum\limits_{t = {1\rightarrow k}}\; {{P\left( {\left. t \middle| q \right.,{T(D)},L} \right)} \times {{Utility}\left( {t,{R(q)},{T(D)}} \right)}}}}}},} & (2)\end{matrix}$

where TQ(D),L,Q′) is a measure of tier assignment quality for a currentsystem load L with respect to the set of queries Q′; andUtility(t,R(q),T(D)) is a measure of search result quality obtained whenretrieval ends in the t-th tier.

As noted above, the quality indicator component 108 can determine anindication of quality of a tier assignment. More particularly, theweight determiner component 202 can determine weights (w) for each queryin the set of queries Q′. The load determiner component 204 candetermine the system load L present for each query in the set of queriesQ′. The tier determiner component 206 can determine P(t|q,T(D),L), andthe utility determiner component 208 can determine Utility(t,R(q),T(D)). In an example, utility determiner component 208 can usenormalized discounted cumulative gain (NDCG) to determineUtility(t,R(q),T(D)). The utility determiner component 208 can employother mechanisms to measure utility; examples include Mean AveragePrecision (MAP), and Q-measure. These examples are not intended to belimiting, as other mechanisms to measure utility may be employed and arecontemplated.

In a particular example, the utility determiner component 208 canutilize the following algorithm to determine the measure of searchresult quality when retrieval ends in the t-th tier, wherein thealgorithm is a modification of NDCG:

$\begin{matrix}{{{Utility}_{NDCG}\left( {t,{R(q)},{T(D)}} \right)} = {N{\sum\limits_{d \in {R_{t}{(q)}}}\; \frac{2^{{rel}{(d)}} - 1}{\log \left( {{{rank}(d)} + 1} \right)}}}} & (3)\end{matrix}$

where N is a normalization factor, R_(t)(q) is the ordered subset ofdigital items in R(q) stored in tiers 1 through t, rel(d) is a relevancescore for digital item d, and rank(d) is the rank position in R_(t)(q)ofthe digital item. Note that rank(d) can depend on t if more relevantdigital items reside in lower (deeper) tiers; these may not be retrievedif retrieval does not go beyond tier t. As noted above, using amodification of NDCG is but one possible measure of search resultquality for a particular query given current tier assignments, and othermeasures can be utilized, such as the proportion of relevant resultsretrieved, etc.

As can be discerned from the above, the user history data 104 (FIG. 1)can be used to construct the set of queries Q′ and the correspondingresult set R(q) that can be employed to evaluate a tier assignment.P(t|q,T(D),L) can be instantiated for a particular system to reflect atiering policy used in tiered storage system for forwarding queries tothe t-th tier under an observed load L, provided a current tierassignment T(D). Then, given alternative tier assignments (e.g., T₁(D)and T₂(D)), a preferred assignment can be selected by computing TQ.Additionally, the quality indicator component 108 can use TQ toinvestigate the expected quality of search results under varying loads(and thus the quality of tier assignments under different loads), aswell as for different instantiations of a tiering policy used forforwarding queries to different tiers, as described in detail below.

Referring now to FIG. 3, an example system 300 that facilitatesautomatically updating a tier assignment with respect to a tieredstorage system is illustrated. The system 300 includes a tiered storagesystem 302 that may include a plurality of tiers, wherein each of thetiers may be used to store one or more digital items, such as web pages,images, documents, and/or the like. A search component 304 performssearches for digital items stored in the tiered storage system 302 basedat least in part upon received queries. For example, the searchcomponent 304 can be a search engine that is configured to searchthrough a tiered search index in response to receiving a query. Inanother example, the search component 304 may be a portion of a databasemanagement system used to search tiers of storage (e.g., memory, harddrive, . . . ) in response to receipt of a query. In yet anotherexample, the search component 304 may be a desktop search module used tosearch items on a computer. Other search components are alsocontemplated.

The data store 102 retains user history data 104 that can be receivedfrom the search component 304. For example, queries provided to thesearch component 304, user actions upon being provided with searchresults, and sets of search results provided to the user in response tothe query can be stored in the user history data 104. The receivercomponent 106 receives a subset of the user history data 104. Asdescribed above, the quality indicator component 108 can generate theindication 110 of quality of a tier assignment. In an example, theindication 110 may be stored in a computer readable medium upon beinggenerated by the quality indicator component 108.

An update component 306 can receive the indication 110 and an output animproved tier assignment 308 based at least in part upon the indication110. For example, the update component 306 can receive other possibletier assignments and corresponding indications of quality and select atier assignment that corresponds to a highest indication of quality. Forexample, the update component 306 may use heuristics to determine anoptimal or substantially optimal tier assignment (with respect to adefined tier assignment quality metric). In another example, machinelearning techniques, which will be described in greater detail below,can be utilized by the update component 306 to output the improved tierassignment 308. Digital items 310 may then be assigned to the tieredstorage system 302 based at least in part upon the improved tieringassignment 308.

With more detail relating to the update component 306, the indication110 of quality of an initial tier assignment can provide a basis fordeveloping algorithms/techniques for identifying improved tierassignments for digital items. Given a space of possible tierassignments T={T⁽¹⁾(D), . . . , T^((N))(D)}, identifying a tierassignment T*(D) that has an optimal or substantially optimal indicationof tier quality as output by algorithm (2) can be defined as follows:

$\begin{matrix}{{T*(D)} = {\underset{{T^{i}{(D)}} \in T}{argmax}{{{TQ}\left( {{T^{(i)}(D)},L,Q^{\prime}} \right)}.}}} & (4)\end{matrix}$

The set of possible tier assignments T can be defined as a set ofalternative assignments or groups of assignments that are parameterizedby some variables, such as parameters of a static ranking scheme. Thenthe update component 306 can use machine learning techniques to search aset of alternative assignments to identity one of such assignments asbeing optimal or substantially optimal. For example, the updatecomponent 306 may use a neural network, a regression tree, a Bayesiannetwork, or any other suitable machine learning technique to determine atiering assignment that optimizes or substantially optimizes theindication 110.

Furthermore, update component 306 can determine a tiering policy 312that is used to assign the digital items 310 to particular tiers in thetiered storage system 302 based at least in part upon the improved tierassignment 308 and/or a subset of the user history data 104. A tieringpolicy may be used to determine which tiers of the tiered storage system302 to use when storing digital items. For instance, the tiering policy312 may take into account various features of searchable digital itemsthat may be returned in response to one or more queries. Such featuresmay include a static ranking derived from a link structure (e.g., pagerank of a digital item), a rank of a domain that includes the digitalitem, a popularity of the digital item among search engine results, anumber of words in a digital item, color spectrums of images in adigital item, etc. Each of these features may be parameterized by theupdate component 306. In other words, the features may be assignedweights that are used by the tiering policy 312 to assign acorresponding digital item to a tier of the tiered storage system 302.The update component 306 can use machine learning techniques to learnthe weights that are to be assigned to the features, and the tieringpolicy may be used to assign digital items to tiers of the tieredstorage system 302.

With reference now to FIG. 4, an example system 400 that facilitatesupdating a tier assignment based on multiple possible tier assignmentsis illustrated. The system 400 includes the quality indicator component108 that can generate an indication of quality of tier assignments. Morespecifically, the quality indicator component 108 can generateindications of quality of a first tier assignment 402 through an Nthtier assignment 404 based at least in part the user history data 104.The update component 306 can receive the indications of quality (whichmay be values that correspond to a defined tier assignment qualitymetric) and combine several different tier assignments in such a mannerthat a resulting improved tier assignment 406 has a higher quality (asdetermined by the quality indicator component 108) than any of theindividual tier assignments. The update component 306 can combinedifferent tier assignments based at least in part upon the indicationsof quality corresponding to the tier assignments 402-404 and/or a subsetof the user history data 104.

In more detail, combining tier assignments may be a particularinstantiation of algorithm (4), where the set T of possible assignmentsmay be a set of possible combinations of individual tier assignments.The set of possible combinations can be parameterized by some variables,such as parameters of a static ranking scheme. The update component 306can use machine learning techniques to determine a combination ofindividual tier assignments that is optimal or substantially optimalwith respect to a defined tier assignment quality metric. In addition,as discussed above, the update component 306 can generate or update thetiering policy 312 that is used to assign digital items to tiers of atiered storage system based at least in part upon the improved tierassignment 406.

With reference now to FIGS. 5-8, various example methodologies areillustrated and described. While the methodologies are described asbeing a series of acts that are performed in a sequence, it is to beunderstood that the methodologies are not limited by the order of thesequence. For instance, some acts may occur in a different order thanwhat is described herein. In addition, an act may occur concurrentlywith another act. Furthermore, in some instances, not all acts may berequired to implement a methodology described herein.

Moreover, the acts described herein may be computer-executableinstructions that can be implemented by one or more processors and/orstored on a computer-readable medium or media. The computer-executableinstructions may include a routine, a sub-routine, programs, a thread ofexecution, and/or the like. In addition, tier assignments in a searchengine and/or database management system can be determined based atleast in part upon the methodologies described herein. Still further,results of acts of the methodologies may be stored in acomputer-readable medium, displayed on a display device, and/or thelike.

Referring specifically to FIG. 5, an example methodology 500 fordetermining an indication of quality of a tier assignment isillustrated. The methodology 500 starts at 502, and at 504 user historydata is received. For example, the user history data can include queriesthat were issued by users, search results provided to the users inresponse to the queries, user selections of the search results and thesequence of pages viewed by the user after issuing the query. The userhistory data may also include labeled data, wherein relevance of searchresults to queries is explicitly defined by users.

At 506, an indication of quality of a tier assignment is generated basedat least in part upon a subset of the user history data. The methodology500 completes at 508.

Turning now to FIG. 6, a methodology 600 that facilitates determining anindication of quality of a tier assignment with respect to a tieredstorage system is illustrated. The methodology 600 starts at 602, and at604 a weight assigned to a query is determined. For example, the weightmay depend on frequency of issuance of the query. In another example, auser or users may explicitly assign a weight to the query to indicate arelative importance of the query.

At 606, a system load background for the query is determined. As notedabove, the system load may be related to a number of queries that arebeing processed by a search component, such as a search engine ordatabase management system, at a time that the query is processed.

At 608, a probability that a certain tier will be a lowest tier visitedwhen the search engine is under the system load is determined. Forexample, this probability can be determined for each tier used to storesearchable digital items.

At 610, an indication of quality of a tier assignment is determined,where the tier assignment is used to store digital items that correspondto the query in a tiered storage system. The indication of quality isdetermined based at least in part upon the weight, the system load, andthe determined tier probability. In an example, the determinedindication of quality may be stored, at least temporarily, in acomputer-readable medium. The methodology 600 ends at 612.

Referring now to FIG. 7, a methodology 700 for determining an optimal orsubstantially optimal tier assignment (e.g., optimized or substantiallyoptimized for a defined tier assignment quality metric) is illustrated.The methodology 700 starts at 702, and at 704 a plurality of differenttier assignments are received. At 706, user history data is received. Asnoted above, the user history data may include queries, search resultsprovided in response to the queries, and/or user selections of searchresults provided in response to the queries.

At 708, indications of quality are determined for a subset of theplurality of different tier assignments. At 710, tier assignments arecombined such that the resulting combination has a higher indication ofquality than any individual tier assignment. The methodology 700 ends at712.

With reference now to FIG. 8, a methodology 800 that facilitatesupdating a tiering policy is illustrated. In an example, a search enginethat uses a tiering policy to assign digital items to tiers of a searchengine index may use acts of the methodology 800 to update the tieringpolicy. The methodology 800 begins at 802, and at 804 user history datais received. At 806, an indication of quality of a tier assignment isdetermined. At 808, an improved tier assignment is determined based atleast in part upon the user history data and/or the indication ofquality determined at 806. At 810, a tiering policy is updated based atleast in part upon the user history data and the improved tierassignment. For instance, the improved tier assignment may contemplatedigital items that are related to the user history search data, and thetiering policy may be used to assign digital items that were notcontemplated in the improved tier assignment to particular tiers. Themethodology 800 ends at 812.

Now referring to FIG. 9, a high-level illustration of an examplecomputing device 900 that can be used in accordance with the systems andmethodologies disclosed herein is illustrated. For instance, thecomputing device 900 may be used in a search engine system. In anotherexample, the computing device 900 may be used in a database managementsystem. The computing device 900 may be a server, or may be employed indevices that are conventionally thought of as client devices, such aspersonal computers, personal digital assistants, and the like. Thecomputing device 900 includes at least one processor 902 that executesinstructions that are stored in a memory 904. The instructions may be,for instance, instructions for implementing functionality described asbeing carried out by one or more components discussed above orinstructions for implementing one or more of the methods describedabove. The processor 902 may access the memory by way of a system bus906. In addition to storing executable instructions, the memory 904 mayalso store digital items, at least a portion of a tier assignment,indications of quality of one or more tier assignments, etc.

The computing device 900 additionally includes a data store 908 that isaccessible by the processor 902 by way of the system bus 906. The datastore 908 may include executable instructions, one or more tierassignments, indications of quality of tier assignments, user historydata, labeled data, etc. The computing device 900 also includes an inputinterface 910 that allows external devices to communicate with thecomputing device 900. For instance, the input interface 910 may be usedto receive queries from a user by way of a network. The computing device900 also includes an output interface 912 that interfaces the computingdevice 900 with one or more external devices. For example, the computingdevice 900 may display search results by way of the output interface912.

Additionally, while illustrated as a single system, it is to beunderstood that the computing device 900 may be a distributed system.Thus, for instance, several devices may be in communication by way of anetwork connection and may collectively perform tasks described as beingperformed by the computing device 900.

As used herein, the terms “component” and “system” are intended toencompass hardware, software, or a combination of hardware and software.Thus, for example, a system or component may be a process, a processexecuting on a processor, or a processor. Additionally, a component orsystem may be localized on a single device or distributed across severaldevices.

It is noted that several examples have been provided for purposes ofexplanation. These examples are not to be construed as limiting thehereto-appended claims. Additionally, it may be recognized that theexamples provided herein may be permutated while still falling under thescope of the claims.

What is claimed is:
 1. A method comprising: storing a search engineindex across multiple storage tiers in accordance with a tier assignmentpolicy, the multiple storage tiers comprise a first storage tier and asecond storage tier, the first storage tier associated with a firstaccess speed and the second storage tier associated with a second accessspeed that is slower than the first access speed, the tier assignmentpolicy based upon observed user interaction with a search engine; andexecuting a search over at least a portion of the search engine index inresponse to receipt of a query, wherein search results retrieved whenexecuting the search over the search index are based upon the tierassignment policy.
 2. The method of claim 1, wherein executing thesearch over at least the portion of the search engine index comprisesexecuting the search over at least the portion of the search engineindex based upon a load of the search engine at a time of receipt of thequery.
 3. The method of claim 2, further comprising updating theobserved user interaction with the search engine with the query and theload of the search engine at the time of receipt of the query.
 4. Themethod of claim 3, further comprising: updating the tier assignmentpolicy responsive to updating the observed user interaction with thesearch engine; and storing the search engine index across the multiplestorage tiers in response to updating the tier assignment policy.
 5. Themethod of claim 1, further comprising: determining the tier assignmentpolicy based upon the observed user interaction with the search engine,the observed user interaction with the search engine comprises: queriespreviously submitted to the search engine; and tiers in the multiplestorage tiers searched accessed by the search engine when searches wereexecuted based upon the queries.
 6. The method of claim 5, wherein theobserved user interaction with the search engine further comprisessearch results selected by users who issued the queries.
 7. The methodof claim 1, further comprising determining the tier assignment policybased upon the observed user interaction with the search engine, theobserved user interaction with the search engine comprises queriespreviously submitted to the search engine and respective loads of thesearch engine when the queries were submitted.
 8. The method of claim 1,further comprising determining the tier assignment policy based upon theobserved user interaction with the search engine, the observed userinteraction with the search engine comprises queries previouslysubmitted to the search engine and respective weights assigned thereto,the weights are indicative of frequency of issuance of the queries. 9.The method of claim 1, further comprising determining the tierassignment policy based upon the observed user interaction with thesearch engine, the observed user interaction with the search enginecomprises values indicative of search result quality that respectivelycorrespond to queries previously received at the search engine.
 10. Asystem comprising: at least one processor; and memory that storesinstructions that, when executed by the at least one processor, causethe at least one processor to perform acts comprising: receiving aquery; and executing a search over at least a portion of a search engineindex in response to receipt of the query, the search engine indexstored over multiple storage tiers of a tiered storage system inaccordance with a tier assignment policy, the multiple storage tierscomprise data storage devices having different access speeds, the tierassignment policy defines which data storage devices store respectiveportions of the search engine index, and wherein executing the searchcomprises returning search results based upon the query and the tierassignment policy.
 11. The system of claim 10, wherein executing thesearch comprises searching over the portion of the search engine indexbased upon a computational load of the search engine at a time that thequery is received.
 12. The system of claim 10, the data storage devicescomprise different storage capacities.
 13. The system of claim 10, theportion of the search engine index is stored in a first data storagedevice having a first access speed, and wherein executing the searchcomprises: searching over the portion of the search engine index storedin the first data storage device in response to receipt of the query;and refraining from searching over a second portion of the search engineindex stored in a second data storage device based upon a load of thesearch engine a time that the query is received.
 14. The system of claim10, wherein the tier assignment policy is based upon static rankingsassigned to documents indexed by the search engine index.
 15. The systemof claim 10, wherein the tier assignment policy is based upon numbers ofwords in documents indexed by the search engine index.
 16. The system ofclaim 10, wherein the tier assignment policy is based upon values thatare indicative of qualities of tier assignments of documents that areindexed by the search engine index.
 17. A computer-readable data storagedevice comprising instructions that, when executed by a processor, causethe processor to perform acts comprising: storing a portion of a searchengine index in a data storage device in a tiered data storage system,the data storage device has a first access speed, the tiered datastorage system comprises the data storage device and a second datastorage device that has a second access speed that is different from thefirst access speed, the second data storage device comprises a secondportion of the search engine index, wherein the portion of the searchengine index is stored in the data storage device based upon a tierassignment policy and the second portion of the search engine index isstored in the second data storage device based upon the tier assignmentpolicy; and executing a search over at least the portion of the searchengine index in the data storage device in response to receipt of aquery, wherein executing the search comprises returning search resultsbased upon the query and the tier assignment policy.
 18. Thecomputer-readable data storage device of claim 17, wherein the tierassignment policy is based upon loads observed at the search engine whenqueries were received by the search engine.
 19. The computer-readabledata storage device of claim 17, the acts further comprising updatingthe tier assignment policy based upon the search results.
 20. Thecomputer-readable data storage device of claim 17, wherein executing thesearch over at least the portion of the search engine index comprisessearching over the portion of the search engine index in the datastorage device while refraining from searching over the second portionof the search engine index in the second data storage device.